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Project  Accomplishments: 

This  project  focused  on  distributed  control  and  information  fusion/decentralized  learning  over 
communication  systems.  The  first  set  of  problems  we  considered  related  to  distributed  adaptive  control. 
These  are  formulated  as  multi-armed  bandit  problems.  A  decentralized  learning  algorithm,  called  dUCB4, 
for  multi-player  multi-armed  bandit  models  was  proposed  that  had  poly-log  regret.  This  is  the  first 
decentralized  learning  algorithm  with  sublinear  regret.  Later,  another  algorithm,  the  phased  exploration 
and  exploitation  (PEE)  algorithm  was  proposed  that  builds  on  dUCB4,  and  has  improved  performance 
with  log  regret.  Two  conference  papers  have  been  published  and  a  journal  paper  on  this  is  under  review  at 
the  IEEE  Trans,  on  Information  Theory.  Another  is  under  preparation  for  submission  to  IEEE  Trans,  on 
Automatic  Control.  This  has  led  us  to  addressing  the  problem  when  there  are  multiple  controller,  each 
acting  autonomously.  This  is  formulated  as  a  multi-armed  bandit  game.  Some  progress  has  been  made  on 
this  problem  as  well. 

The  second  set  of  problems  we  considered  relate  to  decentralized  control  in  multi-user  communication 
networks.  A  model  was  considered  wherein  there  are  multiple  controllers  who  exchange  their  information 
with  other  controllers  but  with  some  delay.  This  is  a  classical  decentralized  control  problem  that  has 
remained  open  for  over  40  years.  An  asymmetric  delayed  sharing  pattern  was  considered  wherein  it  is 
known  that  they  have  a  'partially  nested  information  structure’  and  an  linear  optimal  controller  exists. 
The  effort  was  on  computing  this  linear  optimal  controller.  Progress  on  this  class  of  problems  has  been 
stalled  for  more  than  30  years,  until  recently.  Two  important  information  sharing  pattern  problems  were 
solved.  A  submitted  journal  paper  has  resulted  along  with  two  conference  papers. 

The  third  set  of  problems  we  considered  was  to  develop  a  new  approximate  dynamic  programming 
method,  which  we  call  the  'empirical  dynamic  programming  (EDP)’  algorithm.  Herein,  any  expectation  is 
replaced  by  a  sample  average  approximation.  The  open  question  has  been  whether  such  an  algorithm  will 
converge,  and  if  so,  whether  it  will  converge  to  the  optimal  policy.  We  have  been  able  to  prove 
convergence  to  the  optimal  policy  in  a  probabilistic  sense.  This  required  new  conceptual  developments  on 
probabilistic  fixed  points  of  random  operators.  Numerical  results  show  that  EDP  performs  better  than 
reinforcement  learning  and  other  stochastic  approximations  methods.  Two  journal  paper  are  to  be 
submitted  to  Mathematics  of  Operations  Research  and  Automatica  respectively  soon.  In  addition,  two 
conference  papers  will  be  submitted  later. 

The  project  has  involved  and  supported  two  PhD  students  (Dileep  Kalathil  and  Naumaan  Nayyar), 
partially  supported  two  postdocs  (Arman  Khouzani  and  Will  Haskell)  and  the  PI.  Dileep  Kalathil’s  PhD 
dissertation  is  expected  to  be  submitted  in  May  2013,  and  will  largely  be  a  compilation  of  papers  [1],  [3], 
[4],  [5]  -  work  done  under  this  project.  Naumaan’s  dissertation  is  expected  to  be  completed  by  May  2014, 


and  will  include  results  in  [2],  In  all,  the  project  has  resulted  in  5  journal  publications/submissions,  and  4 
conference  papers/submissions  so  far. 
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